This document was compiled for the project: PITE on date 2023-04-24 18:02:03 by Pamela Solano.
This work is motivated by papers (Lamont et al. 2016) and (Efthimiou et al. 2023). We expected develop a model which does not fail in predict the Predictive Individual Treatment Effect, PITE, and its heterogeneity. As we well-known a model may perform well the outcome predictions (y.predictor) but badly the predictive treatment effect. We expected develop a model with adequate ability to predict the patient-level treatment effect as well as recognizing uncertainty arising from treatment or control. This decomposition will allow describing the heterogeneity by patient when testing the treatment or not. Appropriate metrics that capture this patient treatment level can be developed via Bayesian learning rule.
Define the probability of success for a patient treated with control by \(p_c\) and the corresponding probability of success for a patient treated with new treatment by \(p_t\). The average treatment effect or the relative efficacy (Hampson et al. 2015) is expressed as the log-odds ratio defined by
\[\theta = \log[\frac{p_t/(1-p_t)}{p_c/(1-p_c)}]\]
Since the odds ratio cannot be negative, \(\frac{p_t/(1-p_t)}{p_c/(1-p_c)}\) it is restricted at the lower end, but not at the upper end, and so has a skewed distribution.
Note that \(\theta>0\) iff \(\frac{p_t/(1-p_t)}{p_c/(1-p_c)}>1\), iff \(p_t(1-p_c)>p_c(1-p_t)\), iff \(p_c(2+1/p_t)<1\).
Given \(p_t\) we can deduce \(p_c\)
Consider the observed equation \(Y_i = f(X) + t_i * P_i + e_i\), where \(f(X)\) describes the linear predictor function. Two treatments \(t_i \in \{0,1\}\), \(P_i\) describes the PITE term. The natural noise is given by \(e_i \sim N(0,\sigma^2)\) (\(\sigma^2\) white noise variance); the system or level equation \(P_i \sim N(P,\tau^2)\) with \(\tau^2\) treatment variance among individuals.
Note that despite \(P_i\) is being modeled as a random variable is signal(level) not noise. We assume that \(P_i\) is the level plus its own noise that is not in \(\sigma^2\) (it is \(\tau^2\)).
Simulation and Recovering parameters from model 1. In particular \(f(X) = X\beta = \beta_0 + \beta_1 * x_{1i} + \beta_2 * x_{2i}\)
Simulation M1
set.seed(417)
N <- 500
pt <- 0.10
pc <- 1/(2+(1/pt)); pc # probability of success for a patient treated with control## [1] 0.08333333
P <- re(pt=pt,pc=pc); P # average treatment effect## [1] 0.2006707
pi <- rnorm(N,P,1); #pi # variance = 5 * mean? More noise than signal
x1 <- rnorm(N,0,1); #x1 # standard intensity: continuous covariate
x2 <- rbinom(N,1,0.1); #x2 # sex: discrete covariate
erro <- rnorm(N,0,1); #erro #
y.control <- 5 + 0.3*x1 + 0.2*x2 + erro # linear predictor plus noise
t <- rbinom(N,1,0.5); #t # treatment
y.obs <- y.control + t*pi
y.treat <- y.control + piFitting M1 in stan
# Naive Model
library(rstan)
stan_code <- '
data {
int<lower=1> N; // total number of observation
int<lower=2> ncolX; //number of predictor levels
matrix[N,ncolX] X; // predictor design matrix
vector[N] t; // treatment
vector[N] Y; // response variable
}
parameters {
vector[ncolX] beta; // fixed effects (with intercept)
real P; // ATE (common pite mean)
vector[N] Pi; // individual pite (random effect)
// std deviations
real<lower=0> sdY; // Y std dev.
real<lower=0> sdbeta; // beta prior std dev.
real<lower=0> sdP; // Pi std dev.
}
model {
// prior:
beta ~ normal(0,sdbeta);
Pi ~ normal(P,sdP);
P ~ normal(0, 100);
//likelihood:
Y ~ normal(X * beta + t .* Pi, sdY);
}
'load("M1-fit.rda")
head(data.obs)head(true.data)head(samples)head(Summary.mcmc)library(ggjoy)
n.var = "beta"
{
r <- samples%>%
dplyr::filter(var==n.var)%>%
ggplot() +
geom_joy(aes(x = mcmc,y = par), scale = 1,show.legend = TRUE)+
scale_fill_grey(start = .4, end = .8)+
geom_point(data = true.data%>%dplyr::filter(var==n.var),aes( x = true, y = par), show.legend = TRUE,color="red")+
coord_flip() +
theme_joy()+
# theme(axis.text.x = element_text(size=15,angle=0,hjust = 0.5),
# axis.text.y = element_text(size=15),
# axis.title = element_text(size=15),
# strip.text = element_text(size=15))+
labs(title= "", #fill="Modelo",
x=parse(text=paste0(n.var,'~"Effects"')), y="")
}
rn.var = "Pi"
{
r <- samples%>%
dplyr::filter(var==n.var)%>%
ggplot() +
geom_joy(aes(x = mcmc,y = par), scale = 1,show.legend = TRUE,rel_min_height = 0.01)+
scale_fill_grey(start = .4, end = .8)+
geom_point(data = true.data%>%dplyr::filter(var==n.var),aes( x = true, y = par), show.legend = TRUE,color="red")+
coord_flip() +
theme_joy()+
labs(title= "",
x=parse(text=paste0(n.var,'~"Effects"')), y="")
}
rn.var = "P"
{
r <- samples%>%
dplyr::filter(var==n.var)%>%
ggplot() +
geom_joy(aes(x = mcmc,y = par), scale = 1,show.legend = TRUE)+
scale_fill_grey(start = .4, end = .8)+
geom_point(data = true.data%>%dplyr::filter(var==n.var),aes( x = true, y = par), show.legend = TRUE,color="red")+
coord_flip() +
theme_joy()+
labs(title= "",
x=parse(text=paste0(n.var,'~"Effects"')), y="")
}
rrequire(dplyr)
mcmc_true <- full_join(Summary.mcmc,true.data)
auxN <- mcmc_true%>%
dplyr::mutate(biasR = ((value-true)/true)*100,
MSE = (value-true)^2,
Cover95 = ifelse( `2.5%`<true&true<`97.5%`,1,0))
Rtable.Saz <- auxN%>%
dplyr::filter(var!='Pi')%>%
group_by(par,var,id_var)%>%
summarise(mbiasR = mean(biasR),
mcv = mean(Cover95),
rMSE = mean(sqrt(MSE)))%>%print(n=100)## # A tibble: 8 × 6
## # Groups: par, var [8]
## par var id_var mbiasR mcv rMSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 P P 1 59.5 1 0.119
## 2 beta[1] beta 1 -0.838 1 0.0419
## 3 beta[2] beta 2 15.4 1 0.0463
## 4 beta[3] beta 3 64.2 1 0.128
## 5 lp__ lp__ 1 -127. 0 2539.
## 6 sdP sdP 1 12.9 1 0.129
## 7 sdY sdY 1 -32.2 0 0.455
## 8 sdbeta sdbeta 1 617. 0 6.17
Rtable.Sazlibrary(xtable)
# sink(file=paste0("Metrics_b.txt"))
print(xtable((Rtable.Saz),digits = 3),include.rownames = FALSE)## % latex table generated in R 4.2.2 by xtable 1.8-4 package
## % Mon Apr 24 18:02:14 2023
## \begin{table}[ht]
## \centering
## \begin{tabular}{llrrrr}
## \hline
## par & var & id\_var & mbiasR & mcv & rMSE \\
## \hline
## P & P & 1.000 & 59.487 & 1.000 & 0.119 \\
## beta[1] & beta & 1.000 & -0.838 & 1.000 & 0.042 \\
## beta[2] & beta & 2.000 & 15.425 & 1.000 & 0.046 \\
## beta[3] & beta & 3.000 & 64.190 & 1.000 & 0.128 \\
## lp\_\_ & lp\_\_ & 1.000 & -126.959 & 0.000 & 2539.180 \\
## sdP & sdP & 1.000 & 12.939 & 1.000 & 0.129 \\
## sdY & sdY & 1.000 & -32.200 & 0.000 & 0.455 \\
## sdbeta & sdbeta & 1.000 & 617.433 & 0.000 & 6.174 \\
## \hline
## \end{tabular}
## \end{table}
# sink()Conclusions
Assuming that a desirable treatment decreases-stabilizes the response noise in addition to producing a change in the level. In this case, since \(Var[Y_i|t_i]=\sigma^2 + t_i^2\tau^2\), model 1 implies that \(Var[Y_i|t_i=1]\geq Var[Y_i|ti=0]\).
M1 does not capture the situation that the processing noise is less than the control noise. (\(\sigma^2 > 0\)).
Treatment noise is less than or equal to control noise. (We expected that)
The relationship between $ ^2_c = Var[Y_i|t_i=0]$ and \(Var[Y_i|ti=1]\) is not clear.
In order to decompose this variability. We define M2 considering \(e_i\mid t_i \sim N(0,\sigma^2 + (1-t_i)\sigma^2_c)\), then \(Var[Y_i|ti] = \sigma^2 + t_i^2\tau^2 + (1-t_i)^2\sigma^2_c\).
M2 guarantees to recognize, \(\tau^2\): treatment variance among individuals, \(\sigma^2_c\) control variance and \(\sigma^2\) white noise variance. Note that M2 need prior information for \(\sigma^2\) or this have to concentrate around low values to ensure identification.
One possibility is considered \(\sigma^2 \equiv 0\). M3: \(Y_i = f(X) + e_i\) with \(e_i\mid t_i \sim N(t_i*P,t_i\tau^2 + (1-t_i)\sigma^2_c)\). This model allow us estimate both variability \(\tau^2\) and \(\sigma^2_c\) terms, then noise from the treatment or from the control could be recovered. Model 3 implies that there is not white noise then the observed treatment variables are only signal(level \(t_i*P\)) which is not ideal. Despite M3 will work in the simulation, the \(\sigma^2\), white noise variance: have to exist!
Advantage of M2 is that \(\tau^2\) contains more signal than noise compared to M3.
Disadvantage of model 2 is the need for a prior information or \(\sigma^2\) have to concentrate around low values to ensure identification. M2 is more realistic.